117 research outputs found
Economic and Fiscal Impacts of Proposed LNG Facility in Robbinston, Maine
The purpose of this study is to examine the economic and fiscal impacts of the proposed Downeast LNG facility on the Town of Robbinston, Washington County, and the State of Maine. The economic impact analysis focuses on the employment and income that are associated with the LNG facility construction and operations. The fiscal impact analysis considers additional local and state tax revenues associated with the facility, as well as increased local government expenditures that are projected to result from the LNG project. This report does not address the environmental, homeland security, or energy security impacts of the LNG facility. In addition, this report does not estimate any changes in the price of delivered natural gas in Maine that could potentially result from a new major energy supplier
RCT Rejection Sampling for Causal Estimation Evaluation
Confounding is a significant obstacle to unbiased estimation of causal
effects from observational data. For settings with high-dimensional covariates
-- such as text data, genomics, or the behavioral social sciences --
researchers have proposed methods to adjust for confounding by adapting machine
learning methods to the goal of causal estimation. However, empirical
evaluation of these adjustment methods has been challenging and limited. In
this work, we build on a promising empirical evaluation strategy that
simplifies evaluation design and uses real data: subsampling randomized
controlled trials (RCTs) to create confounded observational datasets while
using the average causal effects from the RCTs as ground-truth. We contribute a
new sampling algorithm, which we call RCT rejection sampling, and provide
theoretical guarantees that causal identification holds in the observational
data to allow for valid comparisons to the ground-truth RCT. Using synthetic
data, we show our algorithm indeed results in low bias when oracle estimators
are evaluated on the confounded samples, which is not always the case for a
previously proposed algorithm. In addition to this identification result, we
highlight several finite data considerations for evaluation designers who plan
to use RCT rejection sampling on their own datasets. As a proof of concept, we
implement an example evaluation pipeline and walk through these finite data
considerations with a novel, real-world RCT -- which we release publicly --
consisting of approximately 70k observations and text data as high-dimensional
covariates. Together, these contributions build towards a broader agenda of
improved empirical evaluation for causal estimation.Comment: Code and data at https://github.com/kakeith/rct_rejection_samplin
ComLittee: Literature Discovery with Personal Elected Author Committees
In order to help scholars understand and follow a research topic, significant
research has been devoted to creating systems that help scholars discover
relevant papers and authors. Recent approaches have shown the usefulness of
highlighting relevant authors while scholars engage in paper discovery.
However, these systems do not capture and utilize users' evolving knowledge of
authors. We reflect on the design space and introduce ComLittee, a literature
discovery system that supports author-centric exploration. In contrast to
paper-centric interaction in prior systems, ComLittee's author-centric
interaction supports curation of research threads from individual authors,
finding new authors and papers with combined signals from a paper recommender
and the curated authors' authorship graphs, and understanding them in the
context of those signals. In a within-subjects experiment that compares to an
author-highlighting approach, we demonstrate how ComLittee leads to a higher
efficiency, quality, and novelty in author discovery that also improves paper
discovery
The Stellar Population of h and chi Persei: Cluster Properties, Membership, and the Intrinsic Colors and Temperatures of Stars
(Abridged) From photometric observations of 47,000 stars and
spectroscopy of 11,000 stars, we describe the first extensive study of
the stellar population of the famous Double Cluster, h and Persei, down
to subsolar masses. Both clusters have E(B-V) 0.52--0.55 and dM =
11.8--11.85; the halo population, while more poorly constrained, likely has
identical properties. As determined from the main sequence turnoff, the
luminosity of M supergiants, and pre-main sequence isochrones, ages for h
Persei, Persei and the halo population all converge on 14 Myr.
From these data, we establish the first spectroscopic and photometric
membership lists of cluster stars down to early/mid M dwarfs. At minimum, there
are 5,000 members within 10' of the cluster centers, while the entire h
and Persei region has at least 13,000 and as many as 20,000
members. The Double Cluster contains 8,400 M of stars
within 10' of the cluster centers. We estimate a total mass of at least 20,000
M. We conclude our study by outlining outstanding questions regarding
the properties of h and Persei. From comparing recent work, we compile a
list of intrinsic colors and derive a new effective temperature scale for O--M
dwarfs, giants, and supergiants.Comment: 88 pages, many figures, Accepted for publication in The Astrophysical
Journal Supplements. Contact lead author for version with high-resolution
figure
ARIES: A Corpus of Scientific Paper Edits Made in Response to Peer Reviews
Revising scientific papers based on peer feedback is a challenging task that
requires not only deep scientific knowledge and reasoning, but also the ability
to recognize the implicit requests in high-level feedback and to choose the
best of many possible ways to update the manuscript in response. We introduce
this task for large language models and release ARIES, a dataset of review
comments and their corresponding paper edits, to enable training and evaluating
models. We study two versions of the task: comment-edit alignment and edit
generation, and evaluate several baselines, including GPT-4. We find that
models struggle even to identify the edits that correspond to a comment,
especially in cases where the comment is phrased in an indirect way or where
the edit addresses the spirit of a comment but not the precise request. When
tasked with generating edits, GPT-4 often succeeds in addressing comments on a
surface level, but it rigidly follows the wording of the feedback rather than
the underlying intent, and includes fewer technical details than human-written
edits. We hope that our formalization, dataset, and analysis will form a
foundation for future work in this area.Comment: 11 pages, 2 figure
Relatedly: Scaffolding Literature Reviews with Existing Related Work Sections
Scholars who want to research a scientific topic must take time to read,
extract meaning, and identify connections across many papers. As scientific
literature grows, this becomes increasingly challenging. Meanwhile, authors
summarize prior research in papers' related work sections, though this is
scoped to support a single paper. A formative study found that while reading
multiple related work paragraphs helps overview a topic, it is hard to navigate
overlapping and diverging references and research foci. In this work, we design
a system, Relatedly, that scaffolds exploring and reading multiple related work
paragraphs on a topic, with features including dynamic re-ranking and
highlighting to spotlight unexplored dissimilar information, auto-generated
descriptive paragraph headings, and low-lighting of redundant information. From
a within-subjects user study (n=15), we found that scholars generate more
coherent, insightful, and comprehensive topic outlines using Relatedly compared
to a baseline paper list
Autoregulation of yeast ribosomal proteins discovered by efficient search for feedback regulation
Post-transcriptional autoregulation of gene expression is common in bacteria but many fewer examples are known in eukaryotes. We used the yeast collection of genes fused to GFP as a rapid screen for examples of feedback regulation in ribosomal proteins by overexpressing a non-regulatable version of a gene and observing the effects on the expression of the GFP-fused version. We tested 95 ribosomal protein genes and found a wide continuum of effects, with 30% showing at least a 3-fold reduction in expression. Two genes, RPS22B and RPL1B, showed over a 10-fold repression. In both cases the cis-regulatory segment resides in the 5\u27 UTR of the gene as shown by placing that segment of the mRNA upstream of GFP alone and demonstrating it is sufficient to cause repression of GFP when the protein is over-expressed. Further analyses showed that the intron in the 5\u27 UTR of RPS22B is required for regulation, presumably because the protein inhibits splicing that is necessary for translation. The 5\u27 UTR of RPL1B contains a sequence and structure motif that is conserved in the binding sites of Rpl1 orthologs from bacteria to mammals, and mutations within the motif eliminate repression
CiteSee: Augmenting Citations in Scientific Papers with Persistent and Personalized Historical Context
When reading a scholarly article, inline citations help researchers
contextualize the current article and discover relevant prior work. However, it
can be challenging to prioritize and make sense of the hundreds of citations
encountered during literature reviews. This paper introduces CiteSee, a paper
reading tool that leverages a user's publishing, reading, and saving activities
to provide personalized visual augmentations and context around citations.
First, CiteSee connects the current paper to familiar contexts by surfacing
known citations a user had cited or opened. Second, CiteSee helps users
prioritize their exploration by highlighting relevant but unknown citations
based on saving and reading history. We conducted a lab study that suggests
CiteSee is significantly more effective for paper discovery than three
baselines. A field deployment study shows CiteSee helps participants keep track
of their explorations and leads to better situational awareness and increased
paper discovery via inline citation when conducting real-world literature
reviews
Recommended from our members
Reticulation, divergence, and the phylogeography–phylogenetics continuum
Phylogeography, and its extensions into comparative phylogeography, have their roots in the layering of gene trees across geography, a paradigm that was greatly facilitated by the nonrecombining, fast evolution provided by animal mtDNA. As phylogeography moves into the era of next-generation sequencing, the specter of reticulation at several levels—within loci and genomes in the form of recombination and across populations and species in the form of introgression—has raised its head with a prominence even greater than glimpsed during the nuclear gene PCR era. Here we explore the theme of reticulation in comparative phylogeography, speciation analysis, and phylogenomics, and ask how the centrality of gene trees has fared in the next-generation era. To frame these issues, we first provide a snapshot of multilocus phylogeographic studies across the Carpentarian Barrier, a prominent biogeographic barrier dividing faunas spanning the monsoon tropics in northern Australia. We find that divergence across this barrier is evident in most species, but is heterogeneous in time and demographic history, often reflecting the taxonomic distinctness of lineages spanning it. We then discuss a variety of forces generating reticulate patterns in phylogeography, including introgression, contact zones, and the potential selection-driven outliers on next-generation molecular markers. We emphasize the continued need for demographic models incorporating reticulation at the level of genomes and populations, and conclude that gene trees, whether explicit or implicit, should continue to play a role in the future of phylogeography.Organismic and Evolutionary Biolog
- …